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Abstract — We study the fundamental network capacity of a 
multi-user wireless downlink under two assumptions: (1) Chan- 
nels are not explicitly measured and thus instantaneous states are 
unknown, (2) Channels are modeled as ON /OFF Markov chains. 
This is an important network model to explore because channel 
probing may be costly or infeasible in some contexts. In this 
case, we can use channel memory with ACK/NACK feedback 
from previous transmissions to improve network throughput. 
Computing in closed form the capacity region of this network 
is difficult because it involves solving a high dimension partially 
observed Markov decision problem. Instead, in this paper we 
construct an inner and outer bound on the capacity region, 
showing that the bound is tight when the number of users is large 
and the traffic is symmetric. For the case of heterogeneous traffic 
and any number of users, we propose a simple queue-dependent 
policy that can stabilize the network with any data rates strictly 
within the inner capacity bound. The stability analysis uses a 
novel frame-based Lyapunov drift argument. The outer-bound 
analysis uses stochastic coupling and state aggregation to bound 
the performance of a restless bandit problem using a related 
multi-armed bandit system. Our results are useful in cognitive 
radio networks, opportunistic scheduling with delayed/uncertain 
channel state information, and restless bandit problems. 

Index Terms — stochastic network optimization, Markovian 
channels, delayed channel state information (CSI), partially 
observable Markov decision process (POMDP), cognitive radio, 
restless bandit, opportunistic spectrum access, queueing theory, 
Lyapunov analysis. 

I. Introduction 

DUE to the increasing demand of cellular network ser- 
vices, in the past fifteen years efficient communication 
over a single-hop wireless downlink has been extensively stud- 
ied. In this paper we study the fundamental network capacity 
of a time-slotted wireless downlink under the following as- 
sumptions: (1) Channels are never explicitly probed, and thus 
their instantaneous states are never known, (2) Channels are 
modeled as two-state ON/OFF Markov chains. This network 
model is important because, due to the energy and timing 
overhead, learning instantaneous channel states by probing 
may be costly or infeasible. Even if this is feasible (when 
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channel coherence time is relatively large), the time consumed 
by channel probing cannot be re-used for data transmission, 
and transmitting data without probing may achieve higher 
throughput l^.j^In addition, since wireless channels can be 
adequately modeled as Markov chains [3|, |4|, we shall take 
advantage of channel memory to improve network throughput. 

Specifically, we consider a time-slotted wireless downlink 
where a base station serves N users through N (possibly 
different) positively correlated Markov ON/OFF channels. 
Channels are never probed so that their instantaneous states 
are unknown. In every slot, the base station selects at most one 
user to which it transmits a packet. We assume every packet 
transmission takes exactly one slot. Whether the transmission 
succeeds depends on the unknown state of the channel. At 
the end of a slot, an ACK/NACK is fed back from the served 
user to the base station. Since channels are either ON or OFF, 
this feedback reveals the channel state of the served user in 
the last slot and provides partial information of future states. 
Our goal is to characterize all achievable throughput vectors 
in this network, and to design simple throughput-achieving 
algorithms. 

We define the network capacity region A as the closure of 
the set of all achievable throughput vectors. We can compute 
A by locating its boundary points. Every boundary point can 
be computed by formulating a partially observable Markov 
decision process (POMDP) |5|, with information states de- 
fined as, conditioning on the channel observation history, the 
probabilities that channels are ON. This approach, however, 
is computationally prohibitive because the information state 
space is countably infinite (which we will show later) and 
grows exponentially fast with A^. 

The first contribution of this paper is that we construct 
an outer and an inner bound on A. The outer bound comes 
from analyzing a fictitious channel model in which every 
scheduling policy yields higher throughput than it does in the 
real network. The inner bound is the achievable rate region of 
a special class of randomized round robin policies (introduced 

' One quick example is to consider a time-slotted channel with state 
space {B,G}. Suppose channel states are i.i.d. over slots with stationary 
probabilities Pr [B] = 0.2 and Pr [G] = 0.8. At state B and G, at most 

1 and 2 packets can be successfully delivered in a slot, respectively. Packet 
transmissions beyond the capacity will all fail and need retransmissions. 
Channel probing can be done on each slot, which consumes 0.2 fraction 
of a slot. Then the policy that always probes the channel yields throughput 
0.8(2 ■ 0.8 + 1 ■ 0.2) = 1.44, while the policy that never probes the 
channel and always sends packets at rate 2 packets/slot yields throughput 

2 ■ 0.8 = 1.6 > 1.44. 
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in Section [TV-A| l. These policies are simple and take advantage 
of channel memory. In the case of symmetric channels (that 
is, channels are i.i.d.) and when the network serves a large 
number of users, we show that as data rates are more balanced, 
or in a geometric sense as the direction of the data rate vector 
in the Euclidean space is closer to the 45-degree angle, the 
inner bound converges geometrically fast to the outer bound, 
and the bounds are tight. This analysis uses results in f6l, |j7| 
that derive an outer bound on the maximum sum throughput 
for a symmetric system. 

The inner capacity bound is indeed useful. First, the struc- 
ture of the bound itself shows how channel memory improves 
throughput. Second, we show analytically that a large class 
of intuitively good heuristic policies achieve throughput that 
is at least as good as this bound, and hence the bound acts 
as a (non-trivial) performance guarantee. Finally, supporting 
throughput outside this bound may inevitably involve solving 
a much more complicated POMDP. Thus, for simplicity and 
practicality, we may regard the inner bound as an operational 
network capacity region. 

In this paper we also derive a simple queue-dependent dy- 
namic round robin policy that stabilizes the network whenever 
the arrival rate vector is interior to our inner bound. This policy 
has polynomial time complexity and is derived by a novel 
variable-length frame-based Lyapunov analysis, first used 
in ||8| in a different context. This analysis is important because 
the inner bound is based on a mixture of many different types 
of round robin policies, and an offline computation of the 
proper time average mixtures needed to achieve a given point 
in this complex inner bound would require solving 6(2^) 
unknowns in a linear system, which is impractical when N 
is large. The Lyapunov analysis overcomes this complexity 
difficulty with online queue-dependent decisions. 

The results of this paper apply to the emerging area of 
opportunistic spectrum access in cognitive radio networks 
(see ||9| and references therein), where the channel occupancy 
of a primary user acts as a Markov ON/OFF channel to the 
secondary users. Specifically, our results apply to the important 
case where each of the secondary users has a designated 
channel and they cooperate via a centralized controller This 
paper is also a study on efficient scheduling over wireless 
networks with delayed/uncertain channel state information 
(CSI) (see ||T0[-p2| and references therein). The work on 
delayed CSI that is most closely related to ours is pT) , p2) , 
where the authors study the capacity region and throughput- 
optimal policies of different wireless networks, assuming that 
channel states are persistently probed but fed back with delay. 
We note that our paper is significantly different. Here channels 
are never probed, and new (delayed) CSI of a channel is only 
acquired when the channel is served. Implicitly, acquiring the 
delayed CSI of any channel is part of the control decisions in 
this paper. 

This paper is organized as follows. The network model is 
given in Section [H] i nner and outer bounds are constructed in 
Sections III and |IV| and compared in Section [V] in the case 
of symmetric channels. Section VI gives the queue-dependent 
policy to achieve the inner bound. 



II. Network Model 

Consider a base station transmitting data to N users through 
N Markov ON/OFF channels. Suppose time is slotted with 
normalized slots t in {0,1,2,...}. Each channel is modeled 
as a two-state ON/OFF Markov chain (see Fig. [T]i. The state 




n,00 



ri,01 



Fig. 1. A two-state Markov ON/OFF chain for channel n e {1, 2, . . . , A^}. 



evolution of channel n <E {1, 2, , 
probability matrix 



, iV} follows the transition 



Pn,00 Pn.Ol 
Pn.lO Pn,ll 

where state ON is represented by 1 and OFF by 0, and P„.ij 
denotes the transition probability from state i to j. We assume 
Pn.ii < 1 for all n so that no channel is constantly ON. 
Incorporating constantly ON channels like wired links is easy 
and thus omitted in this paper We suppose channel states are 
fixed in every slot and may only change at slot boundaries. We 
assume all channels are positively correlated, which, in terms 
of transition probabilities, is equivalent to assuming P„_ii > 
P„.oi or P„,oi + Pri,io < 1 for all ri|^We suppose the base 
station keeps N queues of infinite capacity to store exogenous 
packet arrivals destined for the N users. At the beginning of 
every slot, the base station attempts to transmit a packet (if 
there is any) to a selected user. We suppose the base station has 
no channel probing capability and must select users oblivious 
of the current channel states. If a user is selected and its current 
channel state is ON, one packet is successfully deUvered to 
that user. Otherwise, the transmission fails and zero packets 
are served. At the end of a slot in which the base station 
serves a user, an ACK/NACK message is fed back from the 
selected user to the base station through an independent error- 
free control channel, according to whether the transmission 
succeeds. Failing to receive an ACK is regarded as a NACK. 
Since channel states are either ON or OFF, such feedback 
reveals the channel state of the selected user in the last slot. 

Conditioning on all past channel observations, define the A^- 
dimensional information state vector us {£) — (w„(t) : 1 < 7i < 
N) where aj„(t) is the conditional probability that channel n 
is ON in slot t. We assume initially w„(0) = 7r„ 0N for all 
n, where 7r„,0N denotes the stationary probability that channel 
n is ON. As discussed in |5, Chapter 5.4], vector Lj{t) is a 
sufficient statistic. That is, instead of tracking the whole system 

-Assumption Pn,li > Pn,oi yields that the state Sn{t) of channel n 
has auto-covariance E [(sn(i) - Es„(t))(s„(t -I- 1) - Es„(t -|- 1))] > 0. 
In addition, we note that the case P„,ii = Pn,oi corresponds to a channel 
having i.i.d. states over slots. Although we can naturally incorporate i.i.d. 
channels into our model and all our results still hold, we exclude them in this 
paper because we shall show how throughput can be improved by channel 
memory, which i.i.d. channels do not have. The degenerate case where all 
channels are i.i.d. over slots is fully solved in |2|. 
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history, the base station can act optimally only based on uj{t). 
The base station shall keep track of the process. 

We assume transition probability matrices P„ for all n are 
known to the base station. We denote by s„(t) € {OFF, ON} 
the state of channel n in slot t. Let n{t) e {1,2,...,A^} 
denote the user served in slot t. Based on the ACK/NACK 
feedback, vector uj{t) is updated as follows. For I < n < N, 

rP„^oi, if n = n(<), s„(i) = OFF 
w„(i+l) = < P„ai, if n = n{t), s„(t) = ON 

[w„(i)P„,ii + (1 - a;„(i))P„,oi, if n ^ n{t). 

(1) 

If in the most recent use of channel n, we observed (through 
feedback) its state was i S {0, 1} in slot (t—k) for some k < t, 
then uJn{t) is equal to the fc-step transition probability P'^^li- In 
general, for any fixed n, probabilities ujn (t) take values in the 
countably infinite set yV„ {Pi';oi, Pn\i ■ k £ N}U{7r„,oN}. 
By eigenvalue decomposition on P„ |13 Chapter 4], we can 



show the fc-step transition probability matrix P^*^^ is 



p(fc) A 



p(*;) 

r n.OO 



p(fe) ■ 

p(fc) 
'^11,11 



(Pnf 



Pn.lO + Pn.Ol(l ~ 

Pn,io(l-(l~a:„)'^) 



Pn,01 (1 - (1 - a^n)'^) ■ 
Pn,01 + Pn,lo(l ^ ^n)^ 



(2) 



where we have defined Xn — Pn.oi + Pn.io- Assuming that 
channels are positively correlated, i.e., a;„ < 1, by (|2]) we have 
the following lemma. 

Lemma 1. For a positively correlated (Pri,ii > Pn.oi) 
Markov ON/OFF channel with transition probability matrix 
P„, we have 

1) The stationary probability 7r„ on = Pn oi/^n- 



2) The k-step transition probability P„ is nondecreasing 
in k and P|f nonincreasing in k. Both P^^q]^ and P|f |x 
converge to 7r„,oN as k ~^ oo. 

As a corollary of Lemma [T] it follows that 



P \ p(fcl) \ pV'=2j ^ pl^s; ^ DCi) ^ p 

r„,ii ^ r„.ii ^ ^ 7r„ ON ^ r„ ^ K„_g^ ^ K„,oi 

(3) 

for any integers ki < k2 and k^ > k^ (see Fig. |2|. To 
maximize network throughput, ^ has some fundamental 
implications. We note that represents the transmission 

success probability over channel n in slot t. Thus we shall keep 
serving a channel whenever its information state is P„,ii, for 
it is the best state possible. Second, given that a channel was 
OFF in its last use, its information state improves as long as the 
channel remains idle. Thus we shall wait as long as possible 
before reusing such a channel. Actually, when channels are 
symmetric (P„ — P for all n), it is shown that a myopic 
policy with this structure maximizes the sum throughput of 
the network J?). 

III. A Round Robin Policy 

For any integer M £ {1, 2, . . . , iV}, we present a spe- 
cial round robin policy RR(Af) serving the first M users 



3(^4 



UJnit) 
Pn 11 t P(^) 



TTn, ON 



n.Ol 



k 



p(fc) 



(k) (k) 

Fig. 2. Diagram of the fc-step transition probabilities and PJ^ of a 

positively correlated Markov ON/OFF channel. 



{1,2,..., M} in the network. The M users are served in the 
circular order 1 2 • • • Af — > 1 — > • • • . In general, we can 
use this policy to serve any subset of users. This policy is the 
fundamental building block of all the results in this paper. 



A. The Policy 

Round Robin Policy RR(A/) : 

1) At time 0, the base station starts with channel 1. Suppose 
initially cl>„(0) = 7r„.0N for all n. 

2) Suppose at time t, the base station switches to channel 
n. Transmit a data packet to user n with probability 

PTi^oi/'^n(^) ^^'^ ^ dummy packet otherwise. In both 
cases, we receive ACK/NACK information at the end 
of the slot. 

3) At time (t + 1), if a dummy packet is sent at time t, 
switch to channel (n mod A/) + 1 and go to Step |2] 
Otherwise, keep transmitting data packets over channel 
n until we receive a NACK. Then switch to channel (n 
mod M) + 1 and go to Step |2] We note that dummy 
packets are only sent on the first slot every time the 
base station switches to a new channel. 

4) Update u)(t) according to ([T]) in every slot. 

Step|2]of RR(A/) only makes sense if tj„(t) > P^^oi' which 
we prove in the next lemma. 

Lemma 2. Under RR(Af), whenever the base station switches 
to channel n G {1, 2, . . . , Af } for another round of transmis- 
sion, its current information state satisfies iuj„(t) > P|i^|. 

Proof of Lemma^ See Appendix [A] ■ 
We note that policy RR(Af) is very conservative and not 
throughput-optimal. For example, we can improve the through- 
put by always sending data packets but no dummy ones. Also, 
it does not follow the guidelines we provide at the end of 
Section |ll] for maximum throughput. Yet, we will see that, in 
the case of symmetric channels, throughput under RR(Af) is 
close to optimal when M is large. Moreover, the underlying 
analysis of RR(Af) is tractable so that we can mix such 
round robin policies over different subsets of users to form 
a non-trivial inner capacity bound. The tractability of RR(A/) 
is because it is equivalent to the following fictitious round 
robin policy (which can be proved as a corollary of Lemma [3] 
provided later). 

Equivalent Fictitious Round Robin: 
1) At time 0, start with channel 1. 
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2) When the base station switches to channel n, set its 
current information state to P^^plj^Keep transmitting 
data packets over channel n until we receive a NACK. 
Then switch to channel {n mod M) + 1 and repeat 
Step|2] 

For any round robin policy that serves channels in the 
circular order 1— >2— >•••—> A/— the technique 
of resetting the information state to P^^qi creates a system 
with an information state that is worse than the information 
state under the actual system. To see this, since in the actual 
system channels are served in the circular order, after we 
switch away from serving a particular channel n, we serve 
the other [M — 1) channels for at least one slot each, and 
so we return to channel n after at least M slots. Thus, its 
starting information state is always at least p|j^qi (the proof 
is similar to that of Lemma |2]i. Intuitively, since information 
states represent the packet transmission success probabilities, 
resetting them to lower values degrades throughput. This is the 
reason why our inner capacity bound constructed later using 
RR(Af) provides a throughput lower bound for a large class 
of policies. 

B. Network Throughput under RR(M) 

Next we analyze the throughput vector achieved by RR(Af). 

1) General Case: Under RR(M), let Lkn denote the dura- 
tion of the kt\\ time the base station stays with channel n. A 
sample path of the {Lkn] process is 



(-^11, Ll2, . . . , LlM , L21, L; 



22, 



^2M 



isi,...)- (4) 



round k — 1 round k — 2 

The next lemma presents useful properties of L^n, which serve 
as the foundation of the throughput analysis in the rest of the 
paper 

Lemma 3. For any integer k and n G {1, 2, . . . , M}, 

1 ) The probability mass function of Lkn is independent of 
k, and is 



Lkn — 



1 



with prob. 1 — Pi*oi 



[j > 2 with prob. P%1 (P„,ii)^^"'' P. 
As a result, for all k (z N we have 



E [Lkn] = 1 



ri.Ol 



Pn. 



= 1 



Pn,oi(l- (l-a:„)) 



M 



10 



•^n Pn. 



10 



2) The number of data packets served in Lkn is {Lkn — !)• 

3) For every fixed channel n, time durations Lkn ore i.i.d. 
random variables over all k. 

Proof of Lemma [ij 
1) Note that Lkn = 1 if, on the first slot of serving channel 
n, either a dummy packet is transmitted or a data packet 
is transmitted but the channel is OFF. This event occurs 
with probability 



3(A/) ■ 
n,01 



n.Ol 



'in reality we cannot set the information state of a cliannel, and therefore 
the policy is fictitious. 



Next, Lkn = J > 2 if in the first slot a data packet is 
successfully served, and this is followed by {j — 2) con- 
secutive ON slots and one OFF slot. This happens with 
probability P|*oi (Pn.ii)^-'^^'' Pn.io- The expectation of 
Lkn can be directly computed from the probabiUty mass 
function. 

2) We can observe that one data packet is served in every 
slot of Lkn except for the last one (when a dummy 
packet is sent over channel n, we have Lkn — 1 and 
zero data packets are served). 

3) At the beginning of every Lkn, we observe from the 
equivalent fictitious round robin policy that RR(M) 
effectively fixes P^^gi current information state, 
regardless of the true current state iuj„(t). Neglecting 
ijJn{t) is to discard all system history, including all past 
Lk'n for all k' < k. Thus Lkn are i.i.d.. Specifically, for 
any k' < k and integers Ik' and Ik we have 



Pr [Lkr 



Ik I Lk 



Ik' 



Pr [Lkn 



lk] 



Now we can derive the throughput vector supported by 
RR(M). Fix an integer K > O.By Lemma[3] the time average 
throughput over channel n after all channels finish their Kth 
rounds, which we denote by iin{K), is 



l^n{K) 



k=l 2^n=l -^kn 



Passing if — > oo, we get 
lim ^niK) 

if— >oo 

= lim ^k=li^kn - 1) 
A^k=l Z^n=l ^kr, 



(l/^)Etl (Lkn -I) 

Ef=i(i/^)Ef=iifc« 



lim 



(5) 



(a) E[Li„]-l 



E'iiEf^m] 



(6) 



n,01 



(1-(1 



)/{XnP. 



M + En=l P«,Ol(l - (1 - a;„)*0/(^nPnao) ' 

where (a) is by the Law of Large Numbers (noting by 
Lemma |3] that Lkn are i.i.d. over A;), and (6) is by Lemma |3] 

2) Symmetric Case: We are particularly interested in the 
sum throughput under RR(Af ) when channels are symmetric, 
that is, all channels have the same statistics P„ = P for all n. 
In this case, by channel symmetry every channel has the same 
throughput. From (|5]), we can show the sum throughput is 



M 

V lim ^ln{K) 

^ — ' A— >oo 



Poi(l-(l-x)^0 
a;Pio + PQi(l-(l-a; 



where in the last term the subscript n is dropped due to channel 
symmetry. It is handy to define a function c^.) : N M as 



Cm 



Poi(l-(l~x)^^) 
^Pio + Poi(l-(l-2;)*0' 



Pqi 



(6) 
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and define Cao == linij\/_j.oo cm ~ Poi/(a^Pio + Poi) (note that 
a; < 1 because every channel is positively correlated over time 
slots). The function C(.) will be used extensively in this paper 
We summarize the above derivation in the next lemma. 

Lemma 4. Policy RR(Af) serves channel n <E {1,2,..., M} 
with throughput 



M - 



Ef=l Pn,Ol(l - (1 - X„)M)/(x„P„,io) 

In particular, in symmetric channels the sum throughput under 
RR(M) is Cm defined as 

Poi(1-(1-.t)*0 



Cm 



10, 



xPw + Poi{i-{i-xYty 

and every channel has throughput cm /M. 

We remark that the sum throughput cm of RR(A/) in 
the symmetric case is nondecreasing in M, and thus can be 
improved by serving more channels. Interestingly, here we 
see that the sum throughput is improved by having multiuser 
diversity in the network, even though instantaneous channel 
states are never known. 

C. How Good is RR(M) ? 

Next, in symmetric channels, we quantify how close the sum 
throughput Cm is to optimal. The following lemma presents a 
useful upper bound on the maximum sum throughput. 

Lemma 5 ([6l, f7l|). In symmetric channels, any scheduling 
policy that confines to our model lias sum throughput less than 
or equal to CooQ 

By Lemma |4] and [5] the loss of the sum throughput of 
RR(A/) is no larger than Coo — cm- Define c/\/ as 

Poi(l-(l-x)*'0 



Cm 



c^il-il-x)^') 



(7) 



xPio + Poi 
and note that cm < cm < Cqo. It follows 

Coo ~ Cm 1£ Coo ^ Cm — Coo(l ^ x)^"^ 

The last term of (|7]i decreases to zero geometrically fast as 
M increases. This indicates that RR(M) yields near-optimal 
sum throughput even when it only serves a moderately large 
number of channels. 

IV. Randomized Round Robin Policy, Inner and 
Outer Capacity Bound 

A. Randomized Round Robin Policy 

Lemma|4]specifies the throughput vector achieved by imple- 
menting RR(M) over a particular collection of AI channels. 
Here we are interested in the set of throughput vectors 
achievable by randomly mixing RR(Af)-like policies over 

'*We note that the throughput analysis in |6| makes a minor assumption 
on the existence of some limiting time average. Using similar ideas of |;6i, in 
Theorem|2]of Section [lV-C| we will construct an upper bound on the maximum 
sum throughput for general positively correlated Markov ON/OFF channels. 
When restricted to the symmetric case, we get the same upper bound without 
any assumption. 



different channel subsets and allowing a different round-robin 
ordering on each subset. To generalize the RR(A/) policy, 
first let $ denote the set of all iV-dimensional binary vectors 
excluding the all-zero vector (0,0, ...,0). For any binary 
vector <p = {(pi, 02, . . . , (f>N) in $, we say channel n is active 
in if 0„ — 1. Each vector ^ e represents a different 
subset of active channels. We denote by M{<p) the number of 
active channels in 0. 

For each (/>€$, consider the following round robin policy 
RR(<^) that serves active channels in (f) in every round. 

Dynamic Round Robin Policy RR(0): 

1) Deciding the service order in each round: 

At the beginning of each round, we denote by r„ the 
time duration between the last use of channel n and the 
beginning of the current round. Active channels in 4> 
are served in the decreasing order of t„ in this round 
(in other words, the active channel that is least recently 
used is served first). 

2) On each active channel in a round: 

a) Suppose at time t the base station switches to 
channel n. Transmit a data packet to user n with 
probability P'^noi'^^ l^n{t) and a dummy packet 
otherwise. In both cases, we receive ACK/NACK 
information at the end of the slot. 

b) At time (< + 1), if a dummy packet is sent at time 
t, switch to the next active channel following the 
order given in Step[T| Otherwise, keep transmitting 
data packets over channel n until we receive a 
NACK. Then switch to the next active channel and 



go to Step 2a We note that dummy packets are 



only sent on the first slot every time the base station 

switches to a new channel. 
3) Update U3{t) according to ([T]) in every slot. 
Using RR((/)) as building blocks, we consider the following 
class of randomized round robin policies. 
Randomized Round Robin Policy RandRR: 

1) Pick cf) E ^ with probability a,^, where X](/)e$ '^0 = 1- 

2) Run policy RR(<5()) for one round. Then go to Step[T| 
Note that active channels may be served in different order in 

different rounds, according to the least-recently-used service 
order. This allows more time for OFF channels to return to 
better information states (note that P^f 01 nondecreasing in k) 
and thus improves throughput. The next lemma guarantees the 
feasibility of executing any RR(^) policy in RandRR (similar 
to Lemma |2] whenever the base station switches to a new 
channel n, we need a;„(t) > Pi*oi'^" i" Step of RR(0)). 

Lemma 6. When RR{4>) is chosen by RandRR for a new 

round of transmission, every active channel n in (p starts with 
information state no worse than Plf^i"^^^. 

Proof of Lemma^ See Appendix [B] ■ 
Although RandRR randomly selects subsets of users and 
serves them in an order that depends on previous choices, we 
can surprisingly analyze its throughput. This is done by using 
the throughput analysis of RR(A/), as shown in the following 
corollary to Lemma [3j 
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Corollary 1. For each policy RR{<p), ^ e $, within time 
periods in which RR(0) is executed by RandRR, denote by 
-^tn duration of the kth time the base station stays with 
active channel n. Then: 

1) The probability mass function of L'^^ is independent of 
k, and is 



TV _ 



1 



(M(</.)) 



with prob. 1 — P„ oi 



J > 2 with prob. Pi^('^» (P„,ii)(^- 



n,10- 



As a result, for all k & N we have 



E 



kn 



= 1 



-,(Af(</>)) 
n,01 



(8) 
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2) The number of data packets served in L'^,^ is ~ !)• 

3) For every fixed 4> ond every fixed active channel n in (p, 
the time durations Lf^ are i.i.d. random variables over 
all k. 



B. Achievable Network Capacity — An Inner Capacity Bound 

Using Corollary [T] next we present the achievable rate 
region of the class of RandRR policies. For each RR(0) policy, 
define an A^-dimensional vector r/*^ = (?;f , 77^, . . . , 77^) where 

E[Lf„]-l 







if channel n is active in (f), 
otherwise. 



(9) 



where E |^^f,ij is given in ([SJ. Intuitively, by the analysis prior 
to Lemma [4] round robin policy RR(</>) yields throughput rj'jl 
over channel n for each n e {1, 2, . . . , N}. Incorporating all 
possible random mixtures of RR(0) policies for different <p, 
RandRR can support any data rate vector that is entrywise 
dominated by a convex combination of vectors {t)'^} 4,e<i> 
shown by the next theorem. 

Theorem 1 (Generalized Inner Capacity Bound). The class of 
RandRR policies supports all data rate vectors X in the set 
Ai„, defined as 

Ain, = {a|0<A<m, /xe conv {{v'^}^^^) } , 

where rj'f' is defined in (|9]l, conv (A) denotes the convex hull 
of set A, and < is taken entrywise. 

Proof of Theorem 1 See Appendix [C] 



Applying Theorem 
following corollary. 



to symmetric channels yields the 



Corollary 2 (Inner Capacity Bound for Symmetric Channels). 

In symmetric channels, the class of RandRR policies supports 
all rate vectors A e A,„, where 

where Cm(<i>) is defined in (|6|. 

An example of the inner capacity bound and a simple queue- 
dependent dynamic policy that supports all data rates within 
this nontrivial inner bound will be provided later. 



C. Outer Capacity Bound 

We construct an outer bound on A using several novel ideas. 
First, by state aggregation, we transform the information state 
process {ujn{t)} for each channel n into non-stationary two- 
state Markov chains (in Fig. |4] provided later). Second, we 
create a set of bounding stationary Markov chains (in Fig. |5] 
provided later), which has the structure of a multi-armed bandit 
system. Finally, we create an outer capacity bound by relating 
the bounding model to the original non-stationary Markov 
chains using stochastic coupling. We note that since the control 
of the set of information state processes {cj„(t)} for all n can 
be viewed as a restless bandit problem |14|, it is interesting 
to see how we bound the optimal performance of a restless 
bandit problem by a related multi-armed bandit system. 

We first map channel information states a;„(i) into modes 
for each n e {1,2,..., N}. Inspired by ([3]l, we observe that 
each channel n must be in one of the following two modes: 
Ml The last observed state is ON, and the channel has not 
been seen (through feedback) to turn OFF. In this mode 
the information state w„(t) e [7r„.oN, Pn.ii]- 
M2 The last observed state is OFF, and the channel has not 

been seen to turned ON. Here a;„(i) G [Pn.oi, t^jlOn]- 
On channel n, recall that Wn is the state space of a;„(i), and 
define a map /„ : >V„ {Ml, M2} where 



/„(w„(t)) = 



Ml if 0Jn{t) e ( 
^M2 if w„(t) e [P„,oi,7r„,oN]. 

This mapping is illustrated in Fig. |3] 



Pn 11 t P^'^^ 



TTn, ON 



Pn,01 Y P 



n,01 



Ml 
M2 



Fig. 3. The mapping from information states iUn{t) to modes {Ml, M2}. 

For any information state process {uj„{t)} (controlled by 
some scheduling policy), the corresponding mode transition 
process under /„ can be represented by the Markov chains 
shown in Fig. |4] Specifically, when channel n is served in 
a slot, the associated mode transition follows the upper non- 
stationary chain of Fig.|4] When channel n is idled in a slot, the 
mode transition follows the lower stationary chain of Fig. |4] In 
the upper chain of Fig.|4] regardless what the current mode is, 
mode M 1 is visited in the next slot if and only if channel n is 
ON in the current slot, which occurs with probability a;„(t). 
In the lower chain of Fig. |4] when channel n is idled, its 
information state changes from a /c-step transition probability 
to the {k + l)-step transition probability with the same most 
recent observed channel state. Therefore, the next mode stays 
the same as the current mode. We emphasize that, in the upper 
chain of Fig. |4] at mode Ml we always have w„(t) < P„^ii, 
and at mode M2 it is w„(t) < 7r„.0N- A packet is served if 
and only if Ml is visited in the upper chain of Fig. |4] 
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1 - UJ„{t) 
UJn{t) CZi Ml ) ( M2 




1 - UJn(t) 



UJ„{t) 

When channel n is served in a slot. 



iC(m?) (m^i 

When channel n is idled in a slot. 



Poi/(Poi + Pio)- In {Y{t)}, define 



Fig. 4. Mode transition diagrams for the real channel n. 



1 - Pn,ll 

Ml) (' M2 1 - 7r„ ON 




Pn.ll 



When channel n is served in a slot. 



iC(m?) (^^l 

When channel n is idled in a slot. 



Fig. 5. Mode transition diagrams for the fictitious channel n. 



To upper bound throughput, we compare Fig.|4]to the mode 
transition diagrams in Fig. |5] that corresponds to a fictitious 
model for channel n. This fictitious channel has constant 
information state a;„(t) — Pn,ii whenever it is in mode Ml, 
and ijOn{t) = 7r„ ON whenever it is in M2. In other words, 
when the fictitious channel n is in mode Ml (or M2), it sets its 
current information state to be the best state possible when the 
corresponding real channel n is in the same mode. It follows 
that, when both the real and the fictitious channel n are served, 
the probabihties of transitions Ml ^- Ml and M2 Ml in 
the upper chain of Fig. |5] are greater than or equal to those in 
Fig. |4] respectively. In other words, the upper chain of Fig. |5] 
is more likely to go to mode Ml and serve packets than that 
of Fig. |4] Therefore, intuitively, if we serve both the real and 
the fictitious channel n in the same infinite sequence of time 
slots, the fictitious channel n will yield higher throughput for 
all n. This observation is made precise by the next lemma. 

Lemma 7. Consider two discrete-time Markov chains 

and {Y{t)} both with state space {0, 1}. Suppose {X{t)} is 

stationary and ergodic with transition probability matrix 



Poo 
Pio 



Poi 

Pll 



and {Y{t)} is non-stationary with 
Q{t) 



QooW 
QloW 



QoiW 
Qii(t) 



Assume Pqi > Qoi(t) and Pn > Qii(t) /or all t. In {X{t)}, 
let 7rx(l) denote the stationary probability of state 1; 7rx(l) — 



^y(l)^limsupi^y(0 



t=0 

as the limiting fraction of time {Y{t)} stays at state 1. Then 
we have ttjc (1) > 7ry(l). 

Proof of Lemma [?]■ Given in Appendix |Ej ■ 

We note that executing a scheduling policy in the network 
is to generate a sequence of channel selection decisions. 
By Lemma |7] if we apply the same sequence of channel 
selection decisions of some scheduling policy to the set of 
fictitious channels, we will get higher throughput on every 
channel. A direct consequence of this is that the maximum 
sum throughput over the fictitious channels is greater than or 
equal to that over the real channels. 

Lemma 8. The maximum sum throughput over the set of 
fictitious channels is no more than 

Pn,01 



max {cn,oo}, 

r!6{1.2,...,Ar} 



•^nPn,10 ^~ Pn. 



01 



Proof of Lemma We note that finding the maximum 
sum throughput over fictitious channels in Fig.lSlis equivalent 
to solving a multi-armed bandit problem flS) with each 
channel acting as an arm (see Fig.|5]and note that a channel can 
change mode only when it is served), and one unit of reward 
is earned if a packet is delivered (recall that a packet is served 
if and only if mode Ml is visited in the upper chain of Fig.|5]l. 
The optimal solution to the multi-armed bandit system is to 
always play the arm (channel) with the largest average reward 
(throughput). The average reward over channel n is equal to 
the stationary probability of mode Ml in the upper chain of 
Fig. |5] which is 



n,01 



Pn,10 + TTri.ON 2;„P„^io + Pn,01 

This finishes the proof. ■ 

Together with the fact that throughput over any real channel 
n cannot exceed its stationary ON probability 7r„ on, we have 
constructed an outer bound on the network capacity region A 
(the proof follows the above discussions and thus is omitted). 

Theorem 2. (Generalized Outer Capacity Bound): Any sup- 
portable throughput vector A = (Ai, A2, . . . , \n) necessarily 
satisfies 



A,i < 7r„,oN, for all n G {1, 2, . . . , TV}, 



N 



E\n < max {c„.oo} 
ne{l,2,...,N} 



ra=l 

f Pri,01 

— max — 

ne{i,2,...,w} La;„P„,io + P„,oi 

These {N +1) hyperplanes create an outer capacity bound 
Aou, on A. 

Corollary 3 (Outer Capacity Bound for Symmetric Channels). 

In symmetric channels with P„ = P, c„.oo = Coo, and 
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TTji.ON = '''ON for all n, we have 

Kill = A > I ^ A„ < Coo, A„ < TTON for l<n<N \ , 

(10) 

where > is taken entrywise. 



We note that Lemma [5] in Section III-C directly follows 
Corollary |3] 



D. A Two-User Example on Symmetric Channels 

Here we consider a two-user example on symmetric chan- 
nels. For simplicity we will drop the subscript n in notations. 
From Corollary [3] we have the outer bound 

< A„ < Poi/x, for 1 < n < 2, 

A1 + A2 < Poi/(a:Pio + Poi), 

2: = Poi + Pio 

For the inner bound Aint, we note that policy RandRR can 
execute three round robin policies RR(^i!)) for G $ = 
{(1,1), (0,1), (1,0)}. From Corollary |2] we have 

< A„ < /i„, for 1 < n < 2, 



G conv 



C2/2 




Cl 







C2/2 









Cl 



Under the special case Pqi = Pio 
and Aout are shown in Fig. |6] 



0.2, the two bounds Ai, 



A (unknown) 




0.25 



Fig. 6. Compai'ison of rate regions under different assumptions. 

In Fig. |6] we also compare Aint and Aout with other 
rate regions. Set Aideai is the ideal capacity region when 
instantaneous channel states are known without causing any 
(timing) overhead p6) . Next, it is shown in 16J that the 
maximum sum throughput in this network is achieved at point 
A — (0.325,0.325). The (unknown) network capacity region 
A is bounded between Aint and Aout, and has boundary points 
B, A, and C. Since the boundary of A is a concave curve 
connecting B, A, and C, we envision that A shall contain but 
be very close to Aint. 



Finally, the rate region AbUnd is rendered by completely 
neglecting channel memory and treating the channels as i.i.d. 
over slots |2 |. We observe the throughput gain Aint \ Abiind, as 
much as 23% in this example, is achieved by incorporating 
channel memory. In general, if channels are symmetric and 
treated as i.i.d. over slots, the maximum sum throughput in the 
network is tton — ci- Then the maximum throughput gain of 
RandRR using channel memory is c^r — ci, which as — > 00 
converges to 

_ ^ Pqi Pqi 

xPiQ + Pqi Poi + Pio 
which is controlled by the factor x = Pqi + Pio- 

E. A Heuristically Tighter Inner Bound 

It is shown in ||7| that the following policy maximizes the 
sum throughput in a symmetric network: 

Serve channels in a circular order, where on each 
channel keep transmitting data packets until a NACK 
is received. 

In the above two-user example, this policy achieves throughput 
vector A in Fig. |7] If we replace our round robin policy 



heuristic 



A (unknown) 




0.25 -- 



Fig. 7. Comparison of our inner bound Ai„t, tlie unknown network capacity 
region A, and a heuristically better inner bound Aheuristic- 

RR(</)) by this one, heuristically we are able to construct a 
tighter inner capacity bound. For example, we can support 
the tighter inner bound Aheuristic in Fig. |7]by appropriate time 
sharing among the above policy that serves different subsets 
of channels. However, we note that this approach is difficult 
to analyze because the {Lkn} process (see Q) forms a high- 
order Markov chain. Yet, our inner bound Aint provides a good 
throughput guarantee for this class of heuristic policies. 

V. Proximity of the Inner Bound to the True 
Capacity Region — Symmetric Case 

Next we bound the closeness of the boundaries of Ajnt 



and A in the case of symmetric channels. In Section III-C 
by choosing M — N, we have provided such analysis for 
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the boundary point in the direction (1,1, 
generalize to all boundary points. Define 



, , 1). Here we and we shall say v is maximally (i(D)-user diverse. 



V 



{V1,V2, . 



Vn > for 1 < ?i < A^, 
Vn > for at least one n 



as a set of directional vectors. For any v E V, let A"" = 

be the 



\out 
■^2 J 



(Af,Af,...,A'^') and A™' = (A 
boundary point of Aim and Aout in the direction of v, respec- 
tively. It is useful to compute J2n=iiKi^ ~ '^n')' because it 
upper bounds the loss of the sum throughput of Aim from A in 
the direction of vj^We note that computing A"" in an arbitrary 
direction is difficult. Thus we will find an upper bound on 

/ \out \int\ 

A. Preliminary 

To have more intuitions on Ajnt, we start with a toy example 
of TV = 3 users. We are interested in the boundary point of Aim 
in the direction of v — (1,2, 1). Consider two RandRR-type 
policies ij^i and ip2 defined as follows. 



For choose 





= (1,0,0) 


with prob. 1/4 


4>' 


= (0,1,0) 


with prob. 1/2 




= (0,0,1) 


with prob. 1/4 


04 


= (1,1,0) 


with prob. 1 /2 


05 


= (0,1,1) 


with prob. 1 /2 



For i/)2, choose 



Both i/ji and -02 support data rates in the direction of (1, 2, 1). 
However, using the analysis of Lemma |4] and Theorem [T] we 
know ipi supports throughput vector 



ci 
4 



while -02 supports 

"c2/2^ 

C2/2 







C2/2 
C2/2 



C2 

4 



- 4 



where ci and C2 are defined in (|6|. We see that i/)2 achieves 
data rates closer than does to the boundary of Aint. It is 
because every sub-policy of -02, namely RR(</>4) and RR((/)5), 
supports sum throughput C2 (by Lemma |4|, where those of 0i 
only support ci . In other words, policy 02 has better multiuser 
diversity gain than 0i does. This example suggests that we 
can find a good lower bound on A"" by exploring to what 
extent the multiuser diversity can be exploited. We start with 
the following definition. 

Definition 1. For any v E V, we say v is d-user diverse // 
V can be written as a positive combination of vectors in 
where $c( denotes the set of N-dimensional binary vectors 
having d entries be 1. Define 

d(v) = max id I v is d-user diverse}, 

l<d<N 



A. 



^Note that "^^—i (A""' — A'^') also bounds the closeness between Aout and 



The notion of d{v) is well-defined because every v must be 
1-user diverse]^ Definition [l] is the most useful to us through 
the next lemma. 

Lemma 9. The boundary point of Aint in the direction of v E 
V has sum throughput at least Cd{v)> where 

Poi(l-(l-x)'^(-)) 



xPio + Poi(l-(l-a;)'*(''))^ 



s=Poi + Pio. 



Proof of Lemma |9[ If direction v can be written as a 
positive weighted sum of vectors in ^d{-u)^ we can normalize 
the weights, and use the new weights as probabilities to 
randomly mix RR(</)) policies for all 4> G ^d(v)- This way 
we achieve sum throughput Crf(„) in every transmission round, 
and overall the throughput vector will be in the direction of 
V. Therefore the result follows. For details, see Appendix |G] 

■ 

Fig. |8] provides an example of Lemma |9] in the two- 
user symmetric system in Section IV-D We observe that 




0.25 



Fig. 8. An example for Lemma|9]in the two-user symmetric network. Point 
B and C achieve sum throughput c\ = ttqn = 0-5, and the sum throughput 
at D is C2 ~ 0.615. Any other boundary point of Aj„t has sum throughput 
between ci and C2. 



direction (1, 1), the one that passes point D in Fig. [s] is 
the only direction that is maximally 2-user diverse. The sum 
throughput C2 is achieved at D. For all the other directions, 
they are maximally 1-user diverse and, from Fig. [8j only 
sum throughput ci is guaranteed along those directions. In 
general, geometrically we can show that a maximally d-user 
diverse vector, say v^,, forms a smaller angle with the all-1 
vector (1,1,..., 1) than a maximally d'-user diverse vector, 
say Vd', does if d' < d. In other words, data rates along Vd 
are more balanced than those along Vd'- Lemma |9] states that 
we guarantee to support higher sum throughput if the user 
traffic is more balanced. 



*The set #1 = {ei, 62, . . . , ejy} is the collection of unit coordinate 
vectors where e„ has its nth entry be 1 and otherwise. Any vector t) S V, 
V = {vi,V2, ... ,vjsi), can be written as 1; = J]i>„>o '"n^n- 
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B. Proximity Analysis 

We use the notion of d{v) to upper bound X]^!Li('^°"'~'^ri') 
in any direction v e V. Let A°"' ^ OX"' (i.e.. A™* = 0Af 
for all n) for some 6* > 1. By ( [TO] i, the boundary of Aout is 
characterized by the interaction of the {N + 1) hyperplanes 
Y.n=i ^ri = Coo and A„ = TTON for each n e {1,2,..., N}. 
Specifically, in any given direction, if we consider the cross 
points on all the hyperplanes in that direction, the boundary 
point A""' is the one closest to the origin. We do not know 
which hyperplane A°"' is on, and thus need to consider all 



{N + 1) cases. If A°"' is on the plane — Coo, i-e.. 



Ell A™' - Coo, we get 

JL (a) (b) , , 

E(Ar - Atf) < Coo - C,(.) < Coo(l - 

n=l 

where (a) is by Lemma [9] and (b) is by (|7]i. If A""' is on the 
plane A„ = ttqn for some n, then 9 — tton/A'^'. It follows 

N N . 

E(^r - ^n) = - 1) E ^ ^ - 1 

n=l n=l ^ 

The above discussions lead to the next lemma. 

Lemma 10. The loss of the sum throughput of A,„, from A in 
the direction of v is upper bounded by 



>(1 



mm 

l<ri<Af 



(l-x) 



TTON 



1 



(11) 



maxi<„<Ar{A»"} 

Lemma [TO] shows that, if data rates are more balanced, 
namely, have a larger d{v), the sum throughput loss is domi- 
nated by the first term in the minimum of ( [TT| ) and decreases to 
geometrically fast with d{v). If data rates are biased toward 
a particular user, the second term in the minimum of ( [TT| ) 
captures the throughput loss, which goes to as the rate of 
the favored user goes to the single-user capacity tton- 

VI. Throughput- Achieving Queue-dependent 
Round Robin Policy 

Let a„(i), for 1 < n < iV, be the number of exogenous 
packet arrivals destined for user n in slot t. Suppose a„ (t) are 
independent across users, i.i.d. over slots with rate E [a„(i)] — 
A„, and a„(t) is bounded with < an{t) < A^^^, where 
^max is a finite integer Let Un{t) be the backlog of user-?i 
packets queued at the base station at time t. Define U{t) = 
(Uiit), U2{t), . . . , C/jv(t)) and suppose C/„(0) = for all n. 
The queue process {Unit)} evolves as 

f/„(t + 1) = max [Un{t) - ^in{sn{t),t),0\ + a„(0, (12) 

where /i„(s„(t),t) e {0,1} is the service rate allocated to 
user n in slot t. We have Hnisn{t),t) = 1 if user n is served 
and Sn{t) = ON, and otherwise. In the rest of the paper 
we drop s„(t) in iin{sn{t),t) and use /i„(t) for notational 
simplicity. We say the network is (strongly) stable if 

^ t-i N 

limsup - E E [Un{T)] < oo. 

>oo t 

T—0 n—1 



Consider a rate vector A interior to the inner capacity region 
bound Aint given in Theorem [T] Namely, there exists an e > 
and a probability distribution {/?(/,}(/,(=$ such that 



An + ' 



for all 1 < n < iV, 



(13) 



e < E 

4>e<i> 

where 77^ is defined in (|9|. By Theorem [l] there exists a 
RandRR policy that yields service rates equal to the right- 
side of ( [T3) l and thus stabilizes the network with arrival rate 
vector A 1 17 Lemma 3.6]. The existence of this policy is 
useful and we shall denote it by RandRR*. Recall that on 
each new scheduling round, the policy RandRR* randomly 
picks a binary vector <p using probabilities acj, (defined over 
all of the (2^ — 1) subsets of users). The M{<j)) active users 
in 4> are served for one round by the round robin policy 
RR(0), serving the least recently used users first. However, 
solving for the probabilities needed to implement the RandRR* 
policy that yields ([T3| is intractable when N is large, because 
we need to find (2^ — 1) unknown probabilities {a^j^gcj), 
compute {/3</,}</,e# from ([19]), and make ([T3]l hold. Instead 
of probabilistically finding the vector for the current round 
of scheduling, we use the following simple queue-dependent 
policy. 

Queue-dependent Round Robin Policy (QRR): 

1) Start with t = 0. 

2) At time t, observe the current queue backlog vector U{t) 
and find the binary vector e $ defined a^ 



0(t) ^argmax/(i7(t),RR(0)), 



(14) 



where 



/((7(t),RR(0)) 
^ E ^"WIE 

n:(j>ji — 



N 



E 



Lt\Y.Un{t)\n 



and E [if„] = 1 + Pi*ol'^^VPn,io from ^. Ties are 
broken arbitrarily. 
3) Run RR(0(t)) for one round of transmission. We em- 
phasize that active channels in are served in the least- 
recently-used order After the round ends, go to Step|2] 

The QRR policy is a frame-based algorithm similar to 
RandRR, except that at the beginning of every transmission 
round the policy selection is no longer random but based on a 
queue -dependent rule. We note that QRR is a polynomial time 
algorithm because we can compute 0(t) in ([T4| in polynomial 
time with the following divide and conquer approach: 

1) Partition the set $ into subsets {$1, . . . , $Ar}, where 
<^M, M e {l,...,Af}, is the set of A^-dimensional 
binary vectors having exactly M entries be 1. 

2) For each M € {l,...;iV}, find the maximizer of 
/(C/(t), RR(0)) among vectors in ^m- For each e 
^M, we have 

/(i7(t),RR(0)) = 

'The vector tp{t) is a queue-dependent decision and thus we should write 
cp{U{t),t) as a function of U{t). For simplicity we use <^(t) instead. 
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E 



n:4>n — l _ 



piM) / 



5(M) - 
■11.01 



N 

E 

n=l 



C/n(t)A„ 



and the maximizer of f{U{t), RR(</>)) is to activate the 

M channels that yield the M largest summands of the 

above equation. 
3) Obtain cf){t) by comparing the maximizers from the 

above step for different values of M. 
The detailed implementation is as follows. 
Polynomial time implementation of Step [2] of QRR: 
1) For each fixed Af e {1, . . . , N}, we do the following: 



Compute 



Unit)- 



3(Af) 
ri,01 



- 1 



3(Af) ^ 
?i,01 



N 



n,10 



J2Unit)K (15) 



for all n € {1,...,N}. Sort these N numbers and 
define the binary vector </)*^ = {(f>i' , . . . , (pjj ) such that 
(/)*^ = 1 if the value ( [T5] l of channel n is among the M 
largest, otherwise = 0. Ties are broken arbitrarily. 
Let f{U{t), M) denote the sum of the M largest values 
of ([T5]l. 

2) Define M{t) = arg maxi<j\/<Ar /(J7(i), A/). Then we 
assign (f)(t) = 

Using a novel variable-length frame-based Lyapunov anal- 
ysis, we show in the next theorem that QRR stabilizes the 
network with any arrival rate vector A strictly within the inner 
capacity bound Aimj^The idea is that we compare QRR with 
the (unknown) policy RandRR* that stabilizes A. We show 
that, in every transmission round, QRR finds and executes 
a round robin policy RR{(p{t)) that yields a larger negative 
drift on the queue backlogs than RandRR* does in the current 
round. Therefore, QRR is stable. 

Theorem 3. For any data rate vector A interior to A,-,,,, policy 
QRR strongly stabilizes the network. 

Proof of Theorem |5| See Appendix [H] ■ 

VII. Conclusion 

The network capacity of a wireless network is practically 
degraded by communication overhead. In this paper, we take 
a step forward by studying the fundamental achievable rate 
region when communication overhead is kept minimum, that 
is, when channel probing is not permitted. While solving the 
original problem is difficult, we construct an inner and an 
outer bound on the network capacity region, with the aid 
of channel memory. When channels are symmetric and the 
network serves a large number of users, we show the inner 
and outer bound are progressively tight when the data rates 
of different users are more balanced. We also derive a simple 
queue-dependent frame-based policy, as a function of packet 
arrival rates and channel statistics, and show that this policy 
stabilizes the network for any data rates strictly within the 
inner capacity bound. 

*In jSOj we show that as long as the queue backlog vector U it) is not iden- 
tically zero the arrival rate vector A is interior to the inner capacity bound Ain(, 
in Step|2]of the QRR policy we always have max<^g5j f{U (t), RR(<^)) > 0. 



Transmitting data without channel probing is one of the 
many options for communication over a wireless network. 
Practically each option may have pros and cons on criteria like 
the achievable throughput, power efficiency, implementation 
complexity, etc. In the future it is important to explore how to 
combine all possible options to push the practically achievable 
network capacity to the limit. It is part of our future work 
to generalize the methodology and framework developed in 
this paper to more general cases, such as when limited 
probing is allowed and/or other QoS metrics such as energy 
consumption are considered. It will also be interesting to see 
how this framework can be applied to solve new problems in 
opportunistic spectrum access in cognitive radio networks, in 
opportunistic scheduling with delayed/uncertain channel state 
information, and in restless bandit problems. 



Appendix A 

Proof of Lemma |2j Initially, by ([3]) we have a;„ (0) = 
TTn.oN ^ pI^oi for all n. Suppose the base station switches to 
channel n at time t, and the last use of channel n ends at slot 
[t — k) for some k Kt.ln slot [t — k), there are two possible 
cases: 

1) Channel n turns OFF, and as a result the information 
state on slot t is aj„(t) — P|foi- '■° round robin, the 
other (M— 1) channels must have been used for at least 
one slot before t after slot [t — k), and thus k > M. 
By (|3) we have c.„(i) = P^'l, > Pi*^l. 

2) Channel n is ON and transmits a dummy packet. Thus 



= Pi%. By ^ we have w„(i) = Pj,';{i > P 



_ p(fe) 



n,01- 



Appendix B 

Proof of Lemma |6[ At the beginning of a new round, 
suppose round robin policy RR(<^) is selected. We index the 
M{<p) active channels in 4> as (ni, 712, ... , 71m(0))' which is 
in the decreasing order of the time duration between their last 
use and the beginning of the current round. In other words, 
the last use of Uk is earlier than that of Uk' only if fc < fc'. Fix 
an active channel n^. Then it suffices to show that when this 
channel is served in the current round, the time duration back 
to the end of its last service is at least {M{<p) — 1) slots (that 
this channel has information state no worse than Pi^'|,t» then 
follows the same arguments in the proof of Lemma 

We partition the active channels in (p other than Uk 
into two sets A — {ni, ri2, • ■ • , ^T-fc-i} and B = 
{jife+i, nfc_|_2, • ■ • , "a/((/))}- Then the last use of every channel 
in B occurs after the last use of nt, and so channel nk has 
been idled for at least \B\ slots at the start of the current round. 
However, the policy in this round will serve all channels in 
A before serving nk, taking at least one slot per channel, and 
so we wait at least additional \A\ slots before serving channel 
rife. The total time that this channel has been idled is thus at 
least \A\ + \B\ = M{(f)) -1. ■ 
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Appendix C 

Proof of Theorem^ Let Z{t) denote the number of times 
Step [T| of RandRR is executed in [0,t), in which we suppose 
vector is selected Z(j,{t) times. Define ti, where i G Z+, as 
the {i + l)th time instant a new vector 4> is selected. Assume 
to = 0, and thus the first selection occurs at time 0. It follows 
that Z{t~) = i, Z{ti) = i + 1, and the ith round of packet 
transmissions ends at time t~ . 

Fix a vector 0. Within the time periods in which policy 
RR(</)) is executed, denote by hf,^ the duration of the kth time 
the base station stays with channel n. Then the time average 
throughput that policy RR(</)) yields on its active channel n 
over [0,ti) is 



TV 

--1 ^kr 



(16) 



For simplicity, here we focus on discrete time instants {ti] 
large enough so that Z(j,{ti) > for all e $ (so that the 
sums in ([T6| make sense). The generalization to arbitrary time 
t can be done by incorporating fractional transmission rounds, 
which are amortized over time. Next, rewrite ([T6| as 



k=l Z^n-.d 



TV 

= 1 ^kn 



^kn ^ 



(t>e<S> Z^fe=l Z^n:0„ = l ^kn Z^fe=l Z^n:0„ = l ^kn 



(17) 



(*) 



As t 



oo, the second term (*) of ( [T7| i satisfies 



E l Y^^tfr(ti) t4> 
n:0„ = l Zrf,(ti) Z^fc=l -^fcn 



E 



E 



n:0„ — 1 



E 



where (a) is by the Law of Large Numbers (we have shown in 
Corollai-y [l]that Lf^ are i.i.d. for different k) and (b) by (|9|. 

Denote the first term of ([TTJi by Pcj^iti), where we note that 
Pct,{t^) £ [0, 1] for all e $ and J2^e<i> /^^(^O = 1- We can 
rewrite P(j}{ti) as 





'Z^iU) 
[ Z(U) \ 


E 


n:0Ti — 1 


1 \^Z^{ti) j^(f, 
Z^(ti) Z^fc=l ^kn 






\Z4,{tM 
{ Z{U) \ 




= 1 


1 \^Z^(ti) j(f, 
Z^{ti) l^k=l ^kn 



As i — > OO, we have 



lim 



:) 



"0 En:0„ = 


^E 


Lin 




E^e* '^4' En: 


<t>-n = 


E 





(18) 



where by the Law of Large Numbers we have 

Z^t,) 1 



Z{t,) 



Z(j,{ti) 



z^(U) 

/c=l 



E 



^In 



From ([T6|l([T7)([T8|l, we have shown that the throughput con- 
tributed by policy RR(0) on its active channel n is Ptjt'q'jl. 
Consequently, RandRR parameterized by {a^j^g* supports 



any data rate vector A that is entrywise dominated by A < 
E^e* Pck'n''', where {/^^j^e* is defined in ([18]) and r/"^ in (|9|. 

The above analysis shows that every RandRR policy 
achieves a boundary point of Aint defined in Theorem [T| 
Conversely, the next lemma, proved in Appendix [P] shows that 
every boundary point of Aim is achievable by some RandRR 
policy, and the proof is complete. 

Lemma 11. For any probability distribution {/S^j^g^, there 
exists another probability distribution that solves 

the linear system 



for all e $. (19) 



"<aE„:0„ = iE 






E</)e* ^<t> Eri:0„ = 


E 





Appendix D 



Proof of Lemma 11 



For any probability distribution 
{/30}0g$, we prove the lemma by inductively constructing the 
solution {a</,}0g$ to ([T9]l. The induction is on the cardinality 
of Without loss of generality, we index elements in $ 



by $ = {0\0^,...}, where 0'' 



define Xk ^ Er: 



E 



and redefine f3^k 



^ Pk 



We 
and 



ttfc. Then we can rewrite ([T9]l as 

"feXfe 



El<fc<|*| ^kXk, 



forall fcG {1,2,...,|$|}. (20) 



We first note that $ = {0^} is a degenerate case where 
Pi and ai must both be 1. When $ = {0^,0^}, for any 
probability distribution {/3i,/32} with positive elements^it is 
easy to show 



ai = 



a2 = 1 — ai . 



X1P2 + X2P1 

Let $ = {0*^ : 1 < fc < i^} for some K > 2. Assume that 
for any probability distribution {/S^ > : 1 < fc < K} we 
can find {a^ : 1 < fc < K} that solves pO| . 

For the case $ = {0*-' : 1 < k < K + 1} and any {l3k > 
0:l<fc<i^ + l}, we construct the solution {a^. : 1 < 
k < K +1} to ([TSj as follows. Let {72, 73, ... , 7/<'+i} be the 
solution to the linear system 



IkXk 



f3k 



Efc=2 7fcXfe 2^k=2 Pk 



2 < k < K + 1. 



(21) 



By the induction assumption, the set {72, 73, ... , 7/^+1} exists 
and satisfies 7^ e [0, 1] for 2 < fc < K+1 and EfciV Ik = 1- 
Define 

^K+l 



ai 



Pi 'Ek=2 IkXk 

Xi(l 
ak = (I - ai)7fc 



Pi) + PlYl,k=2 IkXk 



2<k<K+l. 



(22) 
(23) 



'if one element of {/3i, /32} is zero, say /32 = 0, we can show necessarily 
a2 = and it degenerates to the one-policy case <!? = {4>^}- Such 
degeneration happens in general cases. Thus in the rest of the proof we will 
only consider probability distributions that only have positive elements. 
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It remains to show (|22]l and ( |23| ) are the desired solution. It 

is easy to observe that ak G [0, 1] for 1 < fc < /s' + 1, and 



K+l 



K+1 



ak = ai + (1 - ai) ^ 7^ = ai + (1 - ai) = 1 

k=l k=2 

By rearranging terms in ( |22] l and using ( |23| l, we have 

"iXi aiXi 



/3i 



(24) 



For 2 < fc < is: + 1, 



(XkXk 



(a) 



(b) 



ttfcXfc 



Lfc=2 «fcX/c 



(1 - ai)7fcXfc 
Efc=2^(l -"i)7feXfc 



^/c=2 
IkXk 



Lfc=i "feXfc 



L;fe=2 7fcXfe 



where (a) is by plugging in (|23j, (b) uses ( p4j l, (c) uses ( |2T] l, 
and (d) is by X^a^^ l^k = 1- The proof is complete. ■ 



Appendix E 

Proof of Lemma [/[ Let Wi (T) C {0, 1, . . . , T - 1} be 
the subset of time instants in which Y{t) — 1. Note that 
EL"o'^W = Wi{T)\. For each t € Mi{T), let l[i^o]W 
be an indicator function which is 1 if Y{t) transits from 1 to 

at time t, and otherwise. We define Nq{T) and l[o^i](t) 
similarly. 

In {0, 1, . . . , r — 1}, since state transitions of {Y{t)} from 

1 to and from to 1 differ by at most 1, we have 



tGA^i(T) teJ\fo{T) 



< 1, 



which is true for all T. Dividing (|25]l by T, we get 



E ^[i^oH 

t6M(T) 



teAAo(T) 

Consider the subsequence {Tk} such that 



< 



T' 



lim - ^ Y{t)^7rYil) 



lim 

k^oo 



mTk)\ 

Tk ' 



(25) 



(26) 



(27) 



Note that {Tfe} exists because (l/T) X;Lo^ ^(*) 

is a bounded 

sequence indexed by integers T. Moreover, there exists a 
subsequence {r„} of {Tk} so that each of the two averages 
in ( |26l l has a limit point with respect to {Tn}, because they are 
bounded sequences, too. In the rest of the proof we will work 
on {Tn}, but we drop subscript n for notational simplicity. 



Passing T — > 00, we get from (|26]l that 



(a) 



|A/'o(T)| 



lim 

T-s-oo 



lim ■; — ^- — -7 



E i[o-i]W h 



= l-7ry'(l) 

(28) 

where (a) is by ^ and (b) is by \JVi{T) \ + \JVo{T)\ = T. 
From ( |28] l we get 

7 



7ry(l) = 



/3 + 7' 



The next lemma, proved in Appendix |F| helps to show 7 < 
Poi. 

Lemma 12 (Stochastic coupling of random binary sequences). 

Let {In}'i^=i be an infinite sequence of binary random vari- 
ables. Suppose for all n € {1, 2, . . .} we have 



Pr [/„ 



, In-l — *n-l] < Poi 



(29) 



for all possible values of ii, . . . ^ in~i- Then we can construct 
a new sequence {/nj^i of binary random variables that are 
i.i.d. with Pr /„ = 1 
for all n. Consequently, we have 

N 



Pqi for all n and satisfy /„ > /„ 



1 ^ 1 ^ . 

lim sup — /„ < lim sup — /„ = Pgi 



To use Lemma 12 to prove 7 < Pqi, let t„ denote the nth 
time Y{t) — and let /„ — l[o-i.i] (in)- For simplicity assume 
{tn} is an infinite sequence so that state is visited infinitely 
often in {Y{t)}. By the assumption that Qoi(i) < Poi for all 
t, we know (|29| holds. Therefore by Lemma [12] we have 

1 ^ 

< lim sup — ^ l[o^i](t„) < Pqi. 



7 



71=1 



Similarly as Lemma 12 we can show /3 > Pio by stochastic 
coupling. Therefore 

Poi 



7ry(l) 



7 

13 + 1 



< 



7 

'10 + 7 



< 



Poi 
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Appendix F 



Proof of Lemma 12- For simplicity, we assume 

Pr [/„ I /i = ii, . . . , In-l = in-l] > 

for all n and all possible values of For each 

n e {1, 2, . . .}, define /„ as follows: If /„ = 1, define /„ = 1. 
If /„ — 0, observe the history = (/i, . . . , /„_i) and 
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independently choose /„ as follows: 

pQi-p4^^=ii-'"r~'] 

I X VVlLll L/XWL/. 

In 



Pr[/„=0|/{'" 







With prob. 1 r-^ — , „ ,i ^ 



(30) 



The probabilities in ( pOl l are well-defined because Pqi > 
Pr [/„ = 1 I by and 

Poi < 1 = Pr [/„ = 1 1 rr^] + Pr [/„ = 1 /r'] 

and therefore 

Poi - Pr [/„ = 1 1 rr^] < Pr [/„ = 1 /r • 

With the above definition of /„, we have /„ = 1 whenever 
/„ = 1. Therefore /„ > /„ for all n. Further, for any n and 
any binary vector i"^^ = {ii, . . . , we have 



Pr 



T 1 I rn— 1 -n— 1 



Pr [/„ = 1 1 /r 1 = 1] + Pr [/„ = 1 /r 1 = zr^] 

Poi - Pr [/„ - 1 I Ii 



m-l _ -n-ll 



Pr [/„ - 1 /r 1 = er'] 

Therefore, for all n we have 

Pr [/„ = 1 



= Poi 



±71 \ ±1 



Pr [II 



n-l ^ .«-n 



(31) 



Poi 



and thus the /„ variables are identically distributed. It remains 
to prove that they are independent. 

Suppose components in /" = (/i, . . . , /„) are independent. 
We prove that components in /"''"^ — (/i, . . . , /„+i) are also 
independent. For any binary vector i"^^ = (ii, . . . , i„+i), 
since 



Pr 



jn+l _ "-n+l 

^1 — n 



Pr 
Pr 



In+l — hi+l I II 
In+1 = hi+l I I" 



i^] Pr [/r = q 

n 

^1 ] n [ 



Ik — *fe 



fe=l 



it suffices to show 

Pr [/„+i = 1 I A" = = Pr [in+i = 1 
Indeed, 

Pr [in+i = 1 I A" = ii 

= ^pr [/„+l = l|/r = ^^/^ = *l 



= Poi. 



X Pr 



jn _ -n I jn _ -n 



1 1 



Pr 



'1 I 



(a) 



j^PoiPr f/r = zri/r = f/ 



where (a) is by ( (3T| ), and the proof is complete. ■ 
Appendix G 

Proof of Lemma^ By definition of d{v), there exists a 
nonempty subset A C and for every <p E A a positive 

real number [3^ > 0, such that v = J2(peA l^4>4>- For each 
cf) E A, we have M{4>) = d{v) and thus cm{4>) — 
Define 



E 



0eA / 



for each cf) £ A and {/?</)}</)eyi is a probability distribution. 
Consider a RandRR policy that in every round selects e 
A with probability /J^. By Lemma [4] this RandRR policy 
achieves throughput vector A = (Ai, . . . , Xjy) that satisfies 







= ( ^, 

\d{v)T„t>eAp4>J 
which is in the direction of v. In addition, the sum throughput 



N / N 



n=l 

is achieved. 



4>eA \n=l 



<l>eA 



Appendix H 

Proof of Theorem^ (A Related RandRR Policy) For 
each randomized round robin policy RandRR, it is useful to 
consider a renewal reward process where renewal epochs are 
defined as time instants at which RandRR starts a new round 
of transmission!^ ^ denote the renewal period. We say 
one unit of reward is earned by a user if RandRR serves a 
packet to that user Let i?„ denote the sum reward earned 
by user n in one renewal period T, representing the number 
of successful transmissions user n receives in one round of 
scheduling. Conditioning on the round robin policy RR(0) 
chosen by RandRR for the current round of transmission, we 
have from Corollary [T] 



E m = 2^ «<aE [T I RR(cf>)] 
4>e<s> 

E [T I RR(0)] = 



E 



E 



TV 



n:0„ — 1 



and for all n e {1, 2, . . . , iV}, 



</>e* 



(32) 



(33) 



(34) 



'"We note that the renewal reward process is defined solely with respect to 
RandRR, and is only used to facilitate our analysis. At these renewal epochs, 
the state of the network, including the current queue state U(t), does not 
necessarily renew itself. 



15 



E [Rn I RR(0)] = 



E 



if 0„ = 1 

if cb„ = 0. 



(35) 



Consider the round robin policy RR((1,1,...,1)) that 
serves all N channels in one round. We define T^ax as its 
renewal period. From Corollary [T] we know E [Tmax] < oo 
and E [(Tmax)^] < oo. Further, for any RandRR, including 
using a RR(c/)) policy in every round as special cases, we can 
show that r„ax is stochastically larger than the renewal period 
T, and (Tmax)^ is stochastically larger than T^. It follows that 

E m < E [r„ax] , E [T^] < E [(T^ax)'] . (36) 

We have denoted by RandRR* (in the discussion after ^T3\ ) 
the randomized round robin policy that achieves a service 
rate vector strictly larger than the target arrival rate vector A 
entrywise. Let T* denote the renewal period of RandRR*, and 
_R* the sum reward (the number of successful transmissions) 
received by user n over the renewal period T* . Then we have 

E[K] (a) E,/,g^a^EK I RR(0)] 
E0e*"0E[T* I RR(0)] 



EK I RR(0)] 




RR(</,)]^ 
RR(0)] E K I RR(0)] 



RR{4>)] E [T* I RR(^)] 



(c)=/3,. 

4>e<s> 



(37) 



where (a) is by (|32|)p4|), (b) is by rearranging terms, (c) is 
by plugging ( |33| ) into ( [T9] l, (d) is by plugging ( |33| l and ( |35| ) 
into (|9]l in Section IV-B and (e) is by ( [T3] l. From ( (37] i we get 



E [i?;] > (A„ + e)E [T*] , for all n e {1, . . . , N). (38) 



(Lyapunov Drift) From ( [T2| ), in a frame of size T (which 
is possibly random), we can show that for all n 



Un{t+T) < max 



T-l 



r=0 



T-l 



a„(t- 



T = 



(39) 

We define a Lyapunov function L{U{t)) ^ (1/2) ^^^^ [/^(t) 
and the T-slot Lyapunov drift 

AriUit)) ^ E mUit + T) - LiUit)) \ [/(<)] , 

where in the last term the expectation is with respect to 
the randomness of the whole network in frame T, including 
the randomness of T. By taking square of ( [39] l and then 
conditional expectation on U{t), we can show 

AriUit)) < liV(l + AiJE [T' \ U(t)] 



E 



N 



.n=l 



X](Mn(< + 7-)-a„(< + T)) 



r=0 



U{t) 



(40) 



Define f{U{t), 9) as the last term of ( |40j i, where 9 represents a 
scheduling policy that controls the service rates /i„(t + r) and 



the frame size T. In the following analysis, we only consider 
9 in the class of RandRR policies, and the frame size T is the 
renewal period of a RandRR policy. By ([36|, the second term 
of (|40]) is less than or equal to the constant Bi ^ (l/2)iV(l + 
Ai^JE [(Tlnax)^] < OO. It follows that 



ATiU{t))<Bi-f{Uit),i 



(41) 



In f{U{t),9), it is useful to consider 9 = RandRR* and 
T is the renewal period T* of RandRR*. Assume t is the 
beginning of a renewal period. For each n E {1, 2, . . . , N}, 
because i?* is the number of successful transmissions user n 
receives in the renewal period T*, we have 



■T*-l 



r=0 



E 



Combining with ( [38) l, we get 

TT'-l 

Y f^n{t + r)\U{t) 

T = 



Uit) 



EK] 



E 



> (A„ + e)E [T* 



(42) 



By the assumption that packet arrivals are i.i.d. over slots and 
independent of the current queue backlogs, we have for all n 



E 



Yj '^«(*" 



U{t) 



r=0 



A„E [T* 



(43) 



Plugging (|42]i and (|43]l into f{U{t), RandRR*), we get 



N 



f{U{t), RandRR*) > eE [T*] ^ [/„(t). (44) 

n=l 

It is also useful to consider as a round robin policy RR(0) 
for some G $. In this case frame size T is the renewal period 
of RR(0) (note that RR(0) is a special case of RandRR). 
From Corollary [T] we have 



E [T'l' I U{t)\ = E [T<^] ^ E 



(45) 



where E 



7-0 



can be expanded by (|8j. Let t be the beginning 
of a transmission round. If channel n is active, we have 

T^-l 

E A*n(i + T) I U{t) =E LI 

T = 

and otherwise. It follows that 

f{U{t),RR{4>)) 



1, 



N 



Y Unim 

t 71:0,1 — 1 



L 



4> 



1 



(a) 



E 

n:0„ — 1 _ 



U„{t)E Lt-l 



-E 



E[Tl']YUn{t)Xn 

n=l 

N 

Ltr\YUrSt)\n 



n=l 



(46) 



where (a) is by ( [45] ) and rearranging terms. 

(Design of QRR) Given the current queue backlogs U{t), 
we are interested in the policy that maximizes f{U{t), 9) over 
all RandRR policies in one round of transmission. Although 
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the RandRR policy space is uncountably large and thus search- 
ing for the optimal solution could be difficult, next we show 
that the optimal solution is a round robin policy RR(0) for 
some e $ and can be found by maximizing f{U{t), RR(^)) 
in ( |46] l over </> e <i>. To see this, we denote by 4>{t) the 
binary vector associated with the RR(</>) policy that maximizes 
f{U{t), RR(0)) over G <f>, and we have 

f{U{t), RR{m)) > f{U{t), RRm, for all e $. (47) 

For any RandRR policy, conditioning on the policy RR(</>) 
chosen for the current round of scheduling, we have 



/((7(t), RandRR) 



^a^/(t/W,RR(0)), (48) 

0e* 



where {a^j^g^ is the probability distribution associated with 
RandRR. By (l^diSll, for any RandRR we get 



fiUit), RR(0(t))) > J2 ^^fiUit), RRm 

4>e<s> 



(49) 



/(J7(t), RandRR). 



We note that as long as the queue backlog vector U{t) is not 
identically zero and the arrival rate vector A is strictly within 
the inner capacity bound Aint, we get 



max/(i7(<),RR(0)) f (U (t) , RR{c(,m 

(fc) (c) 

> f{U{t), RandRR*) > 0, 



(50) 



where (a) is from the definition of 4>{t), (b) from ^9) , and 
(c) from ( |44l ). 

The policy QRR is designed to be a frame-based algo- 
rithm where at the beginning of each round we observe the 
current queue backlog vector U{t), find the binary vector 
<p{t) whose associated round robin policy RR((/)(t)) maxi- 
mizes /(C/(t), RandRR) over RandRR policies, and execute 
RR{cf>{t)) for one round of transmission. We emphasize that in 
every transmission round of QRR, active channels are served 
in the order that the least recently used channel is served first, 
and the ordering may change from one round to another. 

(Stability Analysis) Again, policy QRR comprises of a se- 
quence of transmission rounds, where in each round QRR finds 
and executes policy RR{<p{t)) for one round, and different 
cf){t) may be used in different rounds. In the fcth round, let 
rf'"' denote its time duration. Define tk = ^^^^ for 

all fc e N and note that tk - tk-i = T^^^. Let to = 0. Then 
for each k E N, from ( |4T] i we have 

(a) 



<(C/(tfe_i)) < Bi-/(J7(<fc_i),QRR) 

<Bi-/((7(tfe_i), RandRR* 



(51) 



< Bi -eE[r* 



N 

lY^Unitk^l), 

n=l 



where (a) is by ( |4T] i, (b) is because QRR is the maximizer 
of /(C/(tfc_i), RandRR) over all RandRR policies, and (c) 
is by ( |44| i. By taking expectation over U{tk-i) in ( |5T| ) and 



noting that E [T*] > 1, for all A: e N we get 

E[L{U{tk))]-E[L{U{tk-i))] 



N 



n=l 

N 

<Bi-e ^E[C/„(ife_i)]. 



(52) 



n=l 

Summing ( |52| ) over fc e {1, 2, . . . , K}, we have 

E[LiU{tK))]-E[L{U{to))] 

K N 

<XSi-e^^E[C/„(ifc_i)]. 

k=l n=l 

Since U{tK) > entrywise and by assumption C/(to) 
U{0) = 0, we have 

K N 
fe=l n=l 

Dividing ( |53| l by eK and passing K — > oo, we get 

^ K N ^ 

limsup-^^E[?7„(ife_i)] < — <oo. 



(53) 



(54) 



k=l n=l 



Equation ( [54| i shows that the network is stable when sampled 
at renewal time instants {tk}- Then that it is also stable when 
sampled over all time follows because T^^^, the renewal 
period of the RR(0) policy chosen in the kth round of QRR, 
has finite first and second moments for all k (see ([36])), and in 
every slot the number of packet arrivals to a user is bounded. 
These details are provided in Lemma [13] which is proved in 
Appendix ]l] 

Lemma 13. Given that 



E 



QRR 



< E [r,„„,] , E Ur^^'y] < E [(T„,„,)'] (55) 



for all k G {1, 2, . . .}, packets arrivals to a user is bounded 
by Amax in every slot, and the network sampled at renewal 
epochs {tk} is stable from ( |54| l, we have 



tA'-l N 



limsup — E [Unir)] < oo. 



T=0 n=l 



Appendix I 

Proof of Lemma [Tj] In [tk-i^tk), it is easy to see for 
all n e {1, . . . , A^} 

Un{tk-i + r) < U.n{tk-i) + tA^,,, < r < TQR^ (56) 

Summing ^ over r e {0, 1, . . . , T^^^ - 1}, we get 

rpQRR_-Y 

Un{tk-l + r) < TQRRC/„(t,_i) + (TQRR)2Anax/2. 

(57) 
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Summing ( |57] i over k E {1,2,..., K} and noting that Ik = 
EtiT^^^we have 

T=0 fc=l T=0 

< E [rr'^n(^fc-l) + (rr^)Mn,../2 
k=l 

(58) 

where (a) is by ( |57] l. Taking expectation of ( [58] ) and dividing 
it by tx, we have 



1 (") 1 

- E nunir)] < ^ E 

-'^ T=0 T = 



(^) 1 ^ 



if 



(59) 



where (a) follows > K and (b) is by (|58]l. Next, we have 



E 



E 



= E 

< E[E[r,™,C/„(tfe_i) I t/„(tfe_i)]] 

= E[E [T™ax]t/n(<fc-l)] 

= E[Tn,jE[C/„(<fc_i)], (60) 



rpQRR 



where (a) is because E 
upper bound the last term of ([59j, we have 

tK-l , K 



<E[r„ax]. Using dSSira to 



E ['^»(^)] < ^2 + E [r„,,] [Un{tk-l)] , 



Ik — 



(61) 

where i?2 — 5 E [(Tmax)^] ^max < oo- Summing ( |6T| over 
n G {1, . . . , A^} and passing K 00, we get 

iff-l W 

limsup— E E'^t^"^^)] 

/ K N \ 

<NB2+E [T„,ax] limsup ^ E E ^ [Un{tk-i)] 

V ^ fc=i«=i / 

< + E [r„ax] Si/e<oo, 

where (a) is by (|54]). The proof is complete. ■ 
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